Financial factor influence on scaling and memory of trading 

volume in stock market 



Wei Li, 1 Fengzhong Wang, 1 Shlomo Havlin, 1,2 and H. Eugene Stanley 1 
1 Center for Polymer Studies and Department of Physics, 
Boston University, Boston, MA 02215 USA 
2 Department of Physics, Bar-Ran University, Ramat-Gan 52900, Israel 

(Dated: January 27, 2013) 

Abstract 

We study the daily trading volume volatility of 17,197 stocks in the U.S. stock markets during 
the period 1989-2008 and analyze the time return intervals r between volume volatilities above 
a given threshold q. For different thresholds q, the probability density function P q {r) scales with 
mean interval (r) as P q (r) = (t) -1 / (r / (r)) and the tails of the scaling function can be well 
approximated by a power-law f(x) ~ x~ 7 . We also study the relation between the form of the 
distribution function P q (r) and several financial factors: stock lifetime, market capitalization, 
volume, and trading value. We find a systematic tendency of P q {r) associated with these factors, 
suggesting a multi-scaling feature in the volume return intervals. We analyze the conditional 
probability P g (r|ro) for t following a certain interval tq, and find that P q (r\To) depends on tq such 
that immediately following a short /long return interval a second short /long return interval tends to 
occur. We also find indications that there is a long-term correlation in the daily volume volatility. 
We compare our results to those found earlier for price volatility. 

PACS numbers: 89.65.Gh, 05.45.Tp, 89.75.Da 
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I. INTRODUCTION 



Because the dynamics of financial markets are of great importance in economics and 
econophysics lM9j, the dynamics of both stock price and trading volume have been studied 
for decades as a prerequisite to developing effective investment strategies. Econophysics 
research has found that the distribution of stock price returns exhibits power-law tails and 



that the price volatility time series has long-term power-law correlations 



10 



-|2l|. To better 



understand these scaling features and correlations, Yamasaki et al. 22j and Wang et al. 



24[ studied the behavior of price return intervals r between volatilities occurring above 



a given threshold q. For both daily and intraday financial records, they found that (i) the 
distribution of the scaled price interval rj (r) can be approximated by a stretched exponential 
function, and (ii) the sequence of the price return intervals has a long term memory related 
to the original volatility sequence. The scaling and memory p ro p erties of financial records 



251430]. 



are similar to those found in climate and earthquake data 

A feature of the recent history of the stock market has been large price movements 
associated with high volume. In the Black Monday stock market crash of 1987, the Dow 
Jones Industrials Average (DJIA) plummeted 508 points, losing 22.6 percent of its value in 
one day, which led to the pathological situation in which the bid price for a stock actually 
exceeded the ask price. In this financial crash approximately 6 x 10 8 shares traded, a one- 
day trading volume three times that of the entire week previous. Understanding the precise 
relationship betwe en p rice and volume fluctuations has thus been a topic of great interest 



in recent research 



31 



321 ] . Trading volume data in itself contains much information about 
market dynamics, e.g., the distribution of the daily traded volume displays power-law t ails 
with an exponent within the Levy stable domain 
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Recently, Ren and Zhou 



35] 



studied the intraday database of two composite indices and 20 individual indices in the 
Chinese stock markets. They found that the intraday volume recurrence intervals show a 
power-law scaling, short-term correlations and long-term correlations in each stock index. 

In this study we analyze U.S. stock market data over a range broad enough to allow us 
to identify how several financial factors significantly affect scaling properties. We study the 
daily trading volume volatility return intervals r between two successive volume volatilities 



above a certain threshold q, and find a range of pow er- 
found earlier in price volatility return intervals 



22 



aw distributions broader than that 



231 ] . We find a unique scaling of the 
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probability density function (PDF) P q {r) for different thresholds q. We also perform a 
detailed analysis of the relation between volume volatility return intervals and four financial 
stock factors: (i) stock lifetime, (ii) market capitalization, (iii) average trading volume, and 
(iv) average trading value. We find systematically different power-law exponents for P q (r) 
when binning stocks according to these four financial factors. Similar to that found for 
the Chinese market 35|, we find that in the U.S. stock market the conditional probability 



distribution, P g (r|ro) for r following a certain interval To, demonstrates that volume return 
intervals are short-term correlated. We also find that the daily volume volatility shows a 
stronger long-term correlation for sequences of longer lifetime but no clear changes in long- 
term correlations for different stock size factors such as capitalization, volume, and trading 
value. 



II. DATA 



In order to obtain a sufficiently long time series, we analyze the daily trading volume 
volatility of 17,197 stocks listed in the U.S. stock market for at least 350 days. We obtain 
our data from the Center for Research in Security Prices (CRSP) US stock database, which 
lists the daily prices of all listed NYSE, Amex, and NASDAQ common stocks, along with 
basic market indices. The period we study extends from 1 January 1989 to 31 December 
2008, a total of 5042 trading days. 



III. DISTRIBUTION OF VOLUME VOLATILITY RETURN INTERVALS 



or a stock trading volume time series, in a manner similar to stock price analysis 
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23], we define two basic measures: volume return R and volume volatility v. The volume 
return R is defined as the logarithmic change in the successive daily trading volume for each 
stock, 

where V(t) is the daily trading volume at time t. We define volume volatility to be the 
absolute value of the volume return. In order to compare different stocks, we determine the 



3 



volume volatility u(t) by dividing the absolute returns \R{t)\ by their standard deviation, 

u(t) s \m\ , 2) 

(WW) - (W)\) 2 ) 1/2 ' 

where (• ■ ■ ) is the time average for each stock. The threshold q is thus measured in units of 
standard deviation of absolute volume return 

For a volume volatility time series, we collect the time intervals r between consecutive 
volatilities u(t) above a chosen threshold q and construct a new time series of volume return 
intervals {r^q)}. Fig. QJa) shows the dependence of P q (r) on q, where P q {r) is the PDF 
of the volume volatility return interval time series {r(q)}. Obviously, P q {r) decays more 
slowly for large q than for small q. For large q, P q (r) has a higher probability of having 
large interval values because extreme events are rare in a high threshold series. We next 
determine whether there is there any scaling in the distribution by plotting the PDFs of 
the volume return intervals P 9 (t), scaled with the mean volume return interval (r(q)), for 
different thresholds in Fig. QJb). We can see that all five threshold values q curves (full 
symbols) callapse onto a single curve, suggesting the existence of a scaling relation, 

-f(- 

(t) \{t). 

As the threshold q increases, the curve (rare events) tends to be truncated due to the limited 
size of the dataset. The tails of the scaling function can be approximated by a power-law 
function as shown by the dashed line in Fig. HJb), 

At) J V«)" 7 ' W 
where the tail exponent is 7. The exponent of the scaled PDFs for q = 2 is 7 = 3.2 by the 
least square method, which is the same as the unsealed PDF exponent 7 = 3.2 as shown 
in Fig. Ufa). The power-law exponents for intraday volume recurrence intervals of several 
Chinese stock indices are from 7 = 1.71 to 7 = 3.27 [35]. Our exponents 7 are larger than 
those in the Chinese stock markets. This might be due to differing definitions of volume 



W = —/( — )• (3) 



volatility. In Ref. 35J], the volume volatility is defined as intraday volume divided by the 
average volume at one specific minute of the trading day averaged over all trading days. Here 
we define the volume volatility to be the logarithmic change in the successive daily volumes 
[Eqs. [T] and [2] . For comparison, and using the same approach, Fig. [Tfc) and Fig. JTld) show 



the analogous results for price volatilities (see also the studies in Refs. 



22 



23 



36|). Note 



that it is not easy to distinguish between a stretched exponential and a power-law when 



studying price volatilities 22j, i.e., the power-law range is small and a stretched exponential 
could also provide a good fit. In contrast, the PDFs of the volume volatility return intervals 
display a wide range of power-law tails, which differs from the stretched exponential tail 
apparent in the price return intervals [23). Our results for volume volatility may suggest 
that P q (r) for price volatility is also a power-law, but this could not be verified because 
the range of the observed power-law regime [see Figs. [Tfc) and QJd)] is more limited than 
the broad range of scales seen in the volume volatility [Figs. E(a) and[T(b)]. The difference 
between the power-law and stretched exponential behavior of P q (r) may be related to the 
existence or non-existence respectively of non-linearity represented in the multifractality o : 
the time series. When non-linear correlations appear in a time record, Bugachev et al. [37 1 
showed that P q (r) is a power-law. On the other hand, when non- linear correlations do 



not exist and only linear correlation exists, Bunde et al. 26[ found stretched exponential 
behavior. 

A comparison with the shuffled records allows us to see how the empirical records differ 
from randomized records. We shuffle the volume volatility time series to make a new un- 
corrected sequence of volatility, and then collect the time intervals above a given threshold 
q to obtain synthetic random control records. The curve that fits the shuffled records [the 
open symbols in Fig. 00(b)] is an exponential function, f(x) = e~ ax , and forms a Possion 
distribution. A Poisson distribution indicates no correlation in shuffled volatility data, but 
the empirical records suggest strong correlations in the volatility. 

IV. FINANCIAL FACTORS 

We study the relation between the scaled PDFs P ? (r)(r) as a function of rj (r) for four 
financial factors: (a) stock lifetime, (b) market capitalization, (c) mean volume, and (d) mean 
trading value for threshold q = 2.0. For higher q values, we do not have sufficient data for 
conclusive results 37|. In Fig. [2j we plot the scaled PDFs for these four factors. The volume 
return intervals characterize the distribution of large volume movements. A high probability 
of having a large volume return interval r suggests a correlation in volume volatility, because 
small volatilities are followed by small volatilities and the time interval between the two large 
volatilities becomes relatively longer than those of random records. In order to charaterize 
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how these four factors affect the distribution of volume return intervals, we divide all stocks 
into four bins for each factor. In Fig. |2fa), the probability that r will be large is greater 
in the bin with 15~20 year old stocks (triangles) than in the bins of younger stock. This 
indicates that small volatilities (below the threshold) tend to follow small volatilities and 
that the time intervals between large volatilities in the bin of 15~20 year-old stocks are 
larger than the time intervals in the bin of 5 years old stocks (dots). This also suggests that 
the volume volatility time records of older stocks are more auto-correlated than those of 
younger stocks. The decaying parameters represented by the power-law exponents are quite 
different: 7 = 4.2 for the shortest lifetime bin and 7 = 2.8 for the longest lifetime bin. This 
significant difference might be caused by differences in autocorrelation in these series. 

In Figs. E^b), and|^d), we show a similar tendencies for stock bins with different 
capitalizations, mean volumes, and mean trading values. Trading value is defined as stock 
price multiplied by transaction volume. For each stock, we designate the lifetime average of 
capitalization, volume, and trading value as performance indices. For example, the power- 
law exponents of the PDFs, P 9 (r)(r), increase as the capitalization becomes larger [see 
Fig. E(b)]. To clarify the picture, we divide all stocks into different subsets and study the 
behavior of the power-law exponent 7 with regard to these four factors. In Fig. E^a), stocks 
are sorted into 10 subsets, from 508 days (2 years) to 5080 days (10 years). We fit the power- 
law tails of the volume return intervals for each subset and plot the exponent 7 versus the 
lifetime of the stocks. In Fig. |3fa), we can observe a systematic trend with stock lifetime. 
It is seen that a smaller exponent 7 which indicates a stronger correlation in older stock 
subsets. Similarly, we sort the stocks by capitalization, mean volume, and mean trading 
value, as shown in Figs. Mb), EJ^c), and[3^d). It is seen that 7 decreases with increasing of 
all these three factors but seem to become constant for large values of capitalizations, mean 
volumes and mean trading values. 

Since all factors similarly affect the scaling of the PDF, P 9 (t)(t), we now determine how 
much these factors are correlated. To study the relations between different stock bins, we plot 
the relation between trading value versus capitalization, mean volume versus capitalization, 
and mean trading value versus mean volume for all the stocks shown in Fig. [3j We see that 
larger capitalization stocks tend to have a larger trading volume and a larger trading value, 
which is consistent with Figs. Hfb), He), and QJd). The correlation coefficients between 
trading value and capitalization, mean volume and capitalization, and trading value and 
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volume are 0.62, and 0.55, and 0.78, respectively. The correlation coefficients are high 
because these capitalization, volume, and trading value factors are all affected by firm size. 
Our analyses do not, however, show a significant relationship between stock lifetime and 
its trading value, capitalization, and mean volume, and the correlation coefficients are all 
< 0.20. 



V. SHORT-TERM MEMORY EFFECTS 

We characterize a sequence of volume return intervals in terms of the autocorrelations in 
the time series. If the volume return intervals series are uncorrelated and independent of 
each other, their sequences can be determined only by the probability distribution. On the 
other hand, if the series is auto-correlated, the preceding value will have a memory effect on 
the values following in the sequence of volume volatility return intervals. 

In order to investigate whether short-term memory is present, we study the conditional 
PDF, P 9 (t|to), which is the probability of finding a volume return interval r immediately 
after an interval of size tq. In records without memory, P g (r|ro) should be identical to P q (r) 
and independent of r . Otherwise, P 9 (t|to) should depend on r . Because the statistics for 
r of a single stock are of poor quality, we study Pq(r|r ) for a range of r / (r). The entire 
dataset is partitioned into eight equal-sized subsets, Qi, Q2, ■■■Qs, with intervals of increasing 
size Tq/t. Figure [5] shows the PDFs P g (r|r ) for Q 2 , i.e., small interval size 0.2 < r / (r) < 0.4 
and Qq large interval size 3.2 < tq/ (t) < 6.4 for different q. The probability of finding large 
t/(t) is larger in Qq (open symbols) than in Q2 (full symbols), while the probability of 
finding small t/(t) is larger in Q2 than that in Qq. Thus large To tends to be followed by 
large r, and vice versa, which indicates short-term memory in the volume return intervals 
sequence. Moreover, note that Pj(r|r ) in the same subset for different thresholds q fall onto 
a single curve, which indicates the existence of a unique scaling for the conditional PDFs as 
well. Similar results were 



for price volatilities 22j, |23 ] . 



bund for the volume volatility of the Chinese markets 



35| and 
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VI. LONG-TERM MEMORY EFFECTS 



In previous studies, the price volatility series was shown to have long-term correlations. 
Using a similar approach, we test whether the volume volatility sequence also possesses long- 
term correlations. To answer this question, we employ the detrended fluctuation analysis 
(DFA) method 38M40| to further reveal memory effects in the volume volatility series. Using 
the DFA method, we divide an integrated time series into boxes of equal length n and fit 
a least squares line in each box. Next we compute the root-mean-square fluctuation F(n) 
of the detrended time series within a window of n points and determine the correlation 
exponent a from the scaling function F(n) ~ n a , where a G [0, 1]. The correlation exponent 
a characterizes the autocorrelation in the sequence. The time series has a long-term memory 
and a positive correlation if the exponent factor a > 0.5, indicating that large values tend 
to follow large values and small values tend to follow small values. The time series is 
uncorrelated if a = 0.5 and anti-correlated if a < 0.5. 

Using the DFA method, we analyze the price volatility and volume volatility time series 
by plotting in bins the relation between correlation exponent a and the four financial fac- 
tors, including stock lifetime, market capitalization, mean trading volume, and mean trading 
value. All the price volatility and volume volatility correlation exponents are significantly 
larger than 0.5, suggesting the presence of long-term memory in both price volatility se- 
quences and volume volatility sequences. In all of the plots, the price volatility series shows 
a stronger long-term correlation than the volume volatility series. Moreover, as shown in 
Fig. El^a), ot on average increases for the stocks with a lifetime ranging from 350 days to 3800 
days (about 15 years), and then shows a slight decrease, suggesting that long-lasting stocks 
tend to have a persistent price and volume movement on large scales. The increasing expo- 
nent a indicates that the volume volatility of older stocks is more correlated than that of 
younger stocks. This is consistent with the indication in Fig. |2]Ja) that the volume volatility 
of older stocks are more auto-correlated. Figures E^b), Etc), and EJ^d) show that there is 
no systematic tendency relation between a and market capitalization, trading volume, and 
trading value. 
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VII. CONCLUSIONS 



We have shown the scaling properties and memory effect of volume volatility return in- 
tervals in large stock records of the U.S. market. The scaled distribution of volume volatility 
return intervals displays unique power-law tails for different thresholds q. We also find dif- 
ferent power-law exponents 7 of P q {r) for the four essential stock factors: stock lifetime, 
market capitalization, average trading volume, and average trading value. These different 
exponents may be related to long-term correlations in the interval series. Significantly, the 
daily volume volatility exhibits long-term correlations, similar to that found for price volatil- 
ity. The conditional probability, P q (r\T ) for r following a certain interval r , indicates that 
volume return intervals are short-term correlated. 
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FIG. 1: (Color online) Probability distributions of volume volatility return intervals and price 
volatility return intervals for 17197 stocks. Full symbols with different shapes represent different 
thresholds q varying from 2.0 to 4.0. (a) Distribution of volume volatility return intervals, P q {r) 
versus r. (b) Scaled distribution of volume return intervals (full symbols) P 9 (r)(r) versus t/(t), 
and distribution of volume return intervals for shuffled volatility records (open symbols). The four 
curves with full symbols collapse onto one single curve, indicating a universal scaling function. 
The tail of scaling function is approximately power-law distribution, f(x) ~ x~ 7 , with 7 = 3.2, 
while the curve fitting the shuffled records is exponential function, f(x) = e~ ax , from Possion 
distribution. A Poisson distribution indicates no correlation in shuffled volatility data, but the 
original dataset suggests strong correlation in the volatilities. The power-law exponents for intraday 
volume recurrence intervals of several Chinese stock indices are from 7 = 1.71 to 7 = 3.27 351 ] . 



For comparison, (c) and (d) show the distribution and scaled distribution of price volatility return 
intervals respectively. Note the narrow range of power-law compared to (a). 
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FIG. 2: (Color online) Relations between distribution function P2(t)(t) of volume volatility return 
intervals and four financial factors: (a) lifetime, (b) market capitalization, (c) average daily trading 
volume, (d) average daily trading value, for the threshold q = 2.0. The distrition functions decay 
with various exponents 7 and show similar systematic tendency for four financial factors. 
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FIG. 3: The power-law tail exponent 7 for different subsets of stocks, (a) Stocks are sorted into 
10 subsets of different lifetime. Exponent 7 are abtained by fitting the PDF of volume volatility- 
return intervals for each subset; (b) stocks are sorted into 8 subsets for different capitalization; (c) 
stocks are sorted into 11 subsets for different mean volume; (d) stocks are sorted into 9 subsets for 
different trading value. 
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FIG. 4: Scatter plots for the relations in stocks between trading value and capitalization, mean 
volume and capitalization, trading value and mean volume for all 17,197 stocks. For example, a 
point on panel (a) represents a stock, which has $ 10 8 capitalization and $ 10 6 average trading 
value. The correlation coefficients between trading value and capitalization, mean volume and 
capitalization, trading value and volume are 0.62, and 0.55, and 0.78 respectively. 
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FIG. 5: (Color online) Conditional PDF P q (r\To) of volume volatility return intervals r for different 
thresholds q = 2.0,2.5,3.0, as a function of t/(t) for different tq/(t) bins. A small tq subset Q2 
(full symbols) and a large To subset Qq (open symbols) are displayed in (a). For example, subset Qe 
contains events of finding r after large interval 3.2 < tq/{t) < 6.4. In contrast to subset Qq, subset 
Q2 has larger probability to be followed by small t/(t) and smaller probability to be followed by 
large t/(t), which indicates short term correlation in the records: small intervals are followed by 
small intervals and large intervals are followed by large intervals. There is no memory effect in 
shuffle records as seen in (b) that the PDFs of all the subsets collapse onto one curve. 
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FIG. 6: (Color online) Correlation exponent a obtained from detrended fluctuation analysis (DFA) 
of volume volatility (square) and price volatility (triangle). The plot shows the relation between 
a and four factors: (a) lifetime, (b) market capitalization, (c) average daily trading volume, (d) 
average daily trading value, for the threshold q = 2.0. 
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